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Abstract 

Detection of a signal hidden by noise within a time series is an important 
problem in many astronomical searches, i.e. for light curves containing the 
contributions of periodic/semi-periodic components due to rotating objects 
and all other astrophysical time-dependent phenomena. One of the most 
popular tools for use in such studies is the periodogram, whose use in an 
astronomical context is often not trivial. The optimal statistical properties of 
the periodogram are lost in the case of irregular sampling of signals, which is a 
common situation in astronomical experiments. Parts of these properties are 
recovered by the Lomb-Scargle (LS) technique, but at the price of theoretical 
difficulties, that can make its use unclear, and of algorithms that require 
the development of dedicated software if a fast implementation is necessary. 
Such problems would be irrelevant if the LS periodogram could be used 
to significantly improve the results obtained by approximated but simpler 
techniques. In this work we show that in many astronomical applications 
simpler techniques provide results similar to those obtainable with the LS 
periodogram. The meaning of the Nyquist frequency is also discussed in the 
case of irregular sampling. 
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1. Introduction 

The search for characteristic frequencies in astrophysical phenomena re- 
quires a careful analysis of the data with appropriate statistical tools. Given 
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the simplicity of its use and the wide availabihty of efficient related software, 
one of the most popular techniques for looking for periodicities within a time 
series is the periodogram technique. In astronomical applications, however, 
the use of this technique is not trivial. In fact, this tool exhibits its opti- 
mal properties only in the case of signals sampled on a regular time grid, a 
common situation in engineering applications but not always in astronomical 
experiments. The analysis of a periodogram in the case of irregular sampling 
is often limited by the possibilit y for fully fixing its statistical properties. 
This is an old dated problem (e.g. iGottlieb et al. Ill975l ) and there have been 
many attempts to solv e it. A partial solution has been found in the Lomb- 
Scargle (LS) approach (ILomb Ill976l : IScargle Ill982l ) but at price of theoretical 
difficulties that make its use unclear and, if a fast implementation is needed 
(e.g. in the case of very long time series), the necessity of dedicated software. 
Of course, this would not constitute a relevant issue if LS periodogram could 
be used to notably improve the results obtainable by the statistical analysis 
of a time series. In this paper we argue that in astronomical applications of- 
ten this is not the case. We show how the negligible improvements obtained 
with LS are offset by the ease of interpretation and clarity of the results 
provided by simpler techniques, which do not demand high computing power 
and/or complicated algorithms. 

In Sec. [2] the statistical analysis of sampled signals is addressed in the 
case of a regular sampling, where the mathematical notation and formalism 
are also outlined. The problems and advantages of an irregular sampling are 
analyzed in Sec. |3l The real advantage of the LS periodogram with respect 
to an approximated but simpler technique is considered in Sec. H] on the 
basis of theoretical arguments as well as numerical experiments based on 
synthetic data and of an experimental time series. Finally, Sec. derives our 
conclusions. 



2. Statistical analysis of regularly sampled signals 

If a signal x{t) is sampled on a regular time grid with a constant time 
step At, a time series {xj}^S'Q = (xq, Xi, . . . , xtv-i) is obtainecl^. Often 
the main problem is testing whether x{t) is due only to a noise n{t), or 
whether some other component s{t) is present, i.e. Xj = Sj + rij. The most 



Typically it is assumed that At = 1. 
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popular approach consists of computing the periodogram {pk}k=o ^ 

N equispaced frequencies {fk}k=o = {k/N}: Pk = jj^ \xkf with the discrete 

Fourier transform (DFT) of {xj} being 

N-l 

Xk = Y, Xje-^'^''/'', k = 0,1,..., N-l- (1) 

j=0 

and {/fc} being the Fourier frequencies. The original time series {xj} can be 
recovered from {xk} via 



1 ^ x.e^^-'^^/^ J = 0,1,..., AT -1. (2) 

k=0 

In the case where {xj} is only noise with {nj} a zero-mean, Gaussian, white- 
noise stationary process with standard deviation from Eq. ([1]) it can 
be readily verified that, independently of k, Pk/o'n given by the sum of 
two squared independent, zero-mean, unit- variance, Gaussian random quan- 
tities. As a consequence, the corresponding probability density function 
(PDF) is the exponential distribution. Moreover, whenever k ^ k' with 
k,k' = 0,1, . . . , N/2, pk is independent of pk'. Hence, the probability a that 
at least one of the pk is expected to exceed a level Lpa is 



a 



e 



-Pk/K 



(3) 



Through this quantity it is possible to fix a detection threshold Lpa, 

Lpa = -a^ln[l-(l-a)i/^*], (4) 

corresponding to the level that one or more peaks due to the noise would 
exceed with a prefixed probability a when a number A^* of {statistically 
independent) frequencies are inspected. Threshold Lpa is called the level 
of false alarm. 

For a periodic component with amplitude A, phase (pi and frequency // (in 
units of 1/At) in the set of the Fourier frequencies {fk}, Sj = Asm {2,11 fit j + (pi), 
the periodogram will show a prominent peak dX k = I. Indeed, since a;7v-fc+i 
is the complex conjugate of x^, then cos [27r(A^ — k + l)j] = cos [27rA;j] and 
sin [27r(iy_— fc_fl)j] = — sin [27r/cj] . Hence, Eq. (|2]) can be written in the 
form flChu Il2008h 



1 ^-4 2iTkj . 27rkj 

k=0 
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where 



ak=^XjCos'^^] (6) 
i=o 



or 



AT 



flfc = ; (8) 



2 

.Xk — XN^k+l 



(9) 



Now, since 



27ilj , 27r/j , , 

= a;cos^^ + 6iSin^^, (10) 

only the coefficients xi and xat.; and hence only pi = {of + bf)/N will be 
different from zero. More generally, if Xj = A sin {27t f^tj + 0) + nj, with /* 
close but not identical to the Fourier frequency fi, the periodogram takes 
the form of a squared "sine" function centered at . Also in this case, it is 
expected that pi > Lpa for small values of a (typically 0.05 or 0.01). If s{t) is 
semi-periodic or even non-periodic, the situation is more complicated since 
more peaks are expected, but the basic idea does not change. 
Regular sampling has many advantages, among them: 

• The sine and cosine modes corresponding to the Fourier frequencies 
constitute an orthonormal basis for signal {xj}. This makes operations 
such as noise filtering, separation and/or detection of components of 
interest easier; 

• The spectrogram can be shown to derive from the least-sq uares fit of 



model ([5]) to the observed signal (e.g. see IVio et al. 1120101 ). This pro- 
vides a physical interpretation of the quantity pk as energy associated 
with the component at frequency fk, 

Under the pure noise hypothesis Xj = rij and independently of k, 
and 6fc are uncorrected (independent) Gaussian quantities. As a con- 
sequence Pk contains all the available information. In other words, the 
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use of the joint distribution of and bj. does not provide any advan- 
tage with respect to the use of pk- Moreover, the quantities {p^l^o 
are mutually independent and have a known PDF. All of these facts 
permit the development of simple and effective detection techniques; 

• Quite efficient algorithms are available for the computation of {pk}- 
At the same time, however, it is necessary to stress that: 

• The Fourier frequencies have no particular physical meaning. They 
constitute kinds of natural frequencies that, however, are intrinsic to 
the sampling characteristics and not to the signal under analysis. This 
implies that the frequency of interest could not belong to such a set; 

• If Xj contains a sinusoidal component with frequency > = 0.5 
(in units of 1/At), the periodogram will show a peak in correspondence 
to a frequency / = mod(/u, 27r) < /nj H- This puts an upper limit /nj, 
the so called Nyquist frequency, on the maximal frequency that can be 
detected in a time series. 

In conclusion, a regular sampling simplifies the analysis of the data as well as 
the development of efficient algorithms. However, especially in the context 
of exploratory data analysis, it suffers of some annoying limitations. 

3. Periodogram analysis of irregularly sampled signals 

3.1. Statistical issues 

In Astronomy often the experimental conditions do not permit a regular 
sampling of signals and this leads to the following. First, it is no longer possi- 
ble to define a set of natural frequencies (such as the Fourier frequencies) for 
which to compute the periodogram. Hence, there is no reason for the number 
of frequencies to be equal to the number M of the sampling time instants 
to, ti, • • • , tM-i- Therefore, we write the transformation corresponding to that 
given by Eq. ([1]) in the general form 



^Thc function z = mod(a;, y) provides the remainder z from the division of x by y. 



M-l 




(11) 
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where, without loss of generahty, we have ti = 0. The spectrogram is still 



defined as pf = \xf\ /M. Similarly, Eqs. 



become 



M-l 

ttf = Xtj cos2n ft j] 

j=0 
M-l 

bf xtj sin27r/tj-, 

i=o 



and 



Pf 



4 + ^/ 
M 



(12) 
(13) 

(14) 



Second, the quantity pf loses its physical meaning and it no longer pro- 
vides the energy of a signal at frequency f. Indeed, for a giv en /, pf can be 
obtained from the least-squares problem (IStoica et al. 1120091 ) 



/3f = argmin 



'M-l 



(15) 
(16) 



since it is readily verified that Pf = Xf. If/3/is expressed in the polar form 
= |/3y|e*2^<^/, then the least -squares problem (|T6l) can be rewritten in the 
form 



Pf = argmin 



'M-l 



Y^l^t, - \Pf\cOs{27lft, + <f)f)]^ + 



j=0 



M-l 



\Pf\'J2sin'{2nft,+ 

j=0 



(17) 



The first term in this equation represents the least-squares fit of a sinusoidal 
function, and it can have a physical meaning. The second term represents 
a data-independent quantity with no meaning in the context of the model 
fit. Therefore, Eq. (|T7|) indicates that in the case of irregular sampling the 
periodog ram is not equiv alent to the least-squares fit of sinusoidal functions 
(see also lVio et al. Il2010l ). Consequently, the coefficients aj and bf given by 
Eqs. ffT^ -f lT5]) do not provide the corresponding amplitudes. Since Eq. flTTl) 
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can be interpreted as the correlation between Xt and the sine and cosine 
modes with frequency /, the periodogram becomes a simple statistical measure 
of similarity between the experimental time series and a discrete sinusoidal 
signal of frequency f. 

Another issue hnked to the irregular sampling is the fact that, even under 
the hypothesis of a noise signal with M = N, although still with a Gaussian 
PDF, Of and bf are no longer uncorrelated. As a consequence, the quantities 
Pflat no lon ger h ave an exponen tial PDF. This problem has been solved by 
Lomb I (119761 ) and lScargle I ( 119821 ) . Their approach, however, is a bit tortuous. 
A mor e intuitive, though equivalent, meth od is based on the least-squares 
model (IStoica et al~l 120091 : IVio et al. Ilioioh : 



M-l 



(a/, bf) = argmin [xtj — a/ cos {2tt ftj) — bf sin (27r/tj)]^. (18) 



The solution of this problem is 



where 



A/-1 

j=0 
M-l 

j=0 



Of 

bf 



cos {271 ft j] 

sin {2tt ftj) 

cos {2n ft j] 
sin {2nftj) 



Rj'vf, 



( cos {2nftj) sin {27!- ft j) ) 



(19) 



(20) 
(21) 



The energy Pf associated with frequency / is given by 

M-l 



{of bf) Rj 
r)R-f'rf. 



Of 

bf 



(22) 

(23) 
(24) 

M-l 



In the case of a time series of a Gaussian, zero-mean, white-noise {n^^ }j=o 
with variance a^, from Eq. (I2T]) it is easily verifiable that the entries of the 
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array r j are Gaussian, zero- mean, random quantities with covariance matrix 
c^Rf- Since Rf is a positive definite matrix, it can be factorized in the 



form Rf = R^'I'^R^I'^ with -Rj''^ the Cholesky factorization of Rf f Bjorck 



19961 ). Therefore, the entries of the array r*f = R^^^'^r f / a"^ are independent 



Gaussian random quantities with unit variance and the PDF of Pf/a"^ is 
the exponential distribution. However, there is no guarantee that, whenever 
f f\ Pf independent of Pf,. In general it is not, since with the least- 
squares model (ITS]) a single sinusoid of frequency / is fitted per time. As 
a consequence, in the expression for the threshold Lpa as given by Eq. 
the number of frequencies should be substituted by the number Nf < N 
of independent frequencies. The point is that Nf is not known in advance 
and in principle, Nf can be obtained from the rank of the covariance matrix 
Rf. This last procedure can be computationally quite expensive. However, 



as stressed by IScargle I (Il982l ). the dependence of Lpa on Nf is rather weak 



and in many sit uations, Nf = M/2 provides a reasonable choice (e.g. see 



Vio et al. 1120101 ) 



Before concluding this section, a final remark concerns the advisability 
of working with mean-subtracted signals. If the mean value x of a signal is 
different from zero, Eqs. f|T2l) . f|T3|) imply that its contributions af and bf to 
the coefficients a/ and bf are given by 



M-l 



af = X cos 2n ft j] (25) 

j=0 
M-l 

6/ = X ^ sin27r/tj. (26) 

3=0 

From these equations it appears that, independently of /, both af and bf are 
different from zero. In other words, x influences the entire periodogram and 
not only in correspondence of the frequency / = as in the case of a regu- 
lar sampling. Moreover, the contribution is different for distinct frequencies 
and, since for a given / it is E[aj6j] 7^ 0, with E[.] the expectation opera- 
tor, a spurious correlation is introduced between af and bf. Obviously, all 
that makes more complicated the spectral analysis of the signal of interest. 
Actually, if the time series are not too short, the mean-subtraction opera- 
tion does not imply particular problems. In other cases case, modiflcations 
such as the "floating-mean periodogram" have to be used. For a detailed 



8 



discussion of such a qu e stion s ee ICumming et al. I (Il999[ ) ; iReegen I ( 120071 ) ; 
Zechmeister fc Kiirster I (l2009f ): IVio et al. I (l201ol l 



3.2. Considerations about the Nyquist frequency 

Data with irregular sampling carry information that can be exploited 
in many ways. One of the benefits of an uneven sampling is the drastic 
reduction of the frequency aliasing (i.e. the aliasing of high frequencies down 
to lower ones). In other words, it is possible to identify periodic components 
with frequencies much higher than the f^y corresponding to that of a time 
series with an identical number of equispaced data spanning the same time 
interval. When in Eq. k > N/2, this frequency index can be written as 
k = N/2 + k'. Then, 

f2nkj\ ( 27ik'j\ , (2Tik'i\ , , 

sinK/^siM^J + ^ =(-l)^sin — / (27) 



^ N ) v ^y V^. 

and similarly for the cosine function. Hence, = Vk'- As a consequence, a 
sinusoidal component with frequency index k will produce a prominent peak 
in the periodogram also at k' < k. At the same time, a sinusoidal component 
with frequency index k' will produce a prominent peak in the periodogram 
also at k > k'. Using a periodogram it is not possible to determine whether 
a sinusoidal component is present in the signal with frequency index k or k'. 
In the case of an irregular sampling Eq. (!27|) does not hold. This implies 
that periodogram can be used to distinguish a sinusoid with frequency / 
from another one with frequency /' also when /' > f^y. In particular. 



Eyer fc Bartholdi I (119991 ) found that, if the sampling time grid is in the form 

h = qA, (28) 

with Qj integer numbers and 6t the greatest common divisor for all tj, then 



This is explained as follows: if the sampling pattern is in the form given by 
Eq. (|28|) . then from {xt-} 
where Mg = tu-i/^t and 



Eq. (|28|) . then from {xt-} it is possible to obtain an even time series {x'ljfl^^'' 



" otherwise. 
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The Nyquist frequency for t his tira e serie s is given by Eq. fl29p . A formula 
for its calculation is given in iKoen I (120061 ) . 

If the sampling pattern cannot be expressed in the form fl28|) . then 6t = 0. 
In this case there is the surprising result that /nj = oo. For example, this 
happens when the sampling times are randomly and uniformly distributed 
in the interval [0, T]. If x{t) = sin {^Tifot + 0) it is possible to show that the 
expected values of a/ and hf are, respectively. 



M / cos [27r(/ - /o)T - 0] - cos [0] 
2T \ 27r(/ - /o) 



cos [27r(/ + /o)T - 0] - COS [0] 
2vr(/ + /o) 



(31) 



M r sin [27r(/ - /o)T - 0] + sin 



(32) 



2vr(/ - /o) 

sin [27r(/ + /o)T-0]-sin| 



2vr(/ + /o) 



if / 7^ /o and 



T^r 1 M / cos[0]-cos[47r/oT + 0] \ 

Et[«/o] = ^| Tsm[0]j, (33) 

^r, 1 M f sin[0]-sin[47r/oT + 0] , ^ \ 

Et [^'/o] = ^ I ^^^^^^ + T cos [0] I , (34) 

if / = /o- For increasing values of T, Ef[aj] — )■ — Msin(0)/2 and E([6j] — >■ 
Mcos(0)/2. The equations that provide the expected standard deviations 
and (7^^ are horribly long but, for T sufficiently large with respect to /, 
both these quantities are approximately equal to a/M/2. This implies that 
the uniform random sampling introduces a noise that, however, becomes 
rapidly negligible for increasing values of M. The remarkable point is that 
these results are independent of the frequency /q. Hence, /o can be arbitrarily 
large. As an example. Fig. [1] shows E([a/] and Et[6/] together with the 
corresponding standard deviations cTq^ and ahj for the case where /o = 1 
(i.e. twice the Nyquist frequency corresponding to the mean sampling time 
step), = 0, M = 50 and T = M — 1. For comparison, the results obtained 
from 500 numerical simulations are also displayed. In Fig. [2] the theoretical 
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Ej[aj] and Et[6j] are compared with the result from a single simulation. 
Finally, in Fig. [3] the corresponding periodograms are shown as well the 
corresponding standard deviation as obtained from the numerical simulations 
(the expected values of this quantity give rise to terrible long equations). 
From these results, one could argue that it is possible to detect a periodic 
component independently of its frequency. However, from the analysis of 
Eqs. ([SI])- 032]) it can be inferred that the width of the peak at frequency /o 
is inversely proportional to T. As a consequence, if T is large, the peak will 
be quite narrow and there is a concrete risk of missing it if the periodogram 
is not computed for a sufficiently large number of frequencies. In addition 
for very high frequencies, a periodogram can be deeply altered by even small 
errors in the sampling times tj (see below). 

3.3. Computational issues 

A difficulty introduced by an irregular sampling is the lack of efficiency of 
the algorithm for the computation of {xf}. Indeed, algorithms based on the 
fast Fourier transform (FFT) are inapplicable and the direct implementation 
of Eq. (fTT]) requires an operation count of order MN that is computationally 
quite inefficient. The solutions proposed for overcomin g this problem are 



based on algorithrn s/techniques that are not trivial (e.g. iPress et al. 112007 



Keiner et al. II2008I ). that make it difficult to deal with the experimental sig- 
nals if the computation of {pf} or of the coefficients {xf} represents only one 
step in the analysis procedure. For example, after filtering in the frequency 
domain, it could be necessary to Fourier invert the sequence {xf}. In the 
case of irregular sampling, an inversion similar to that given by Eq. ([2]) does 
not exist. A simple solution that makes things easier consists of rebinning 
the original sampled signal onto an arbitrarily dense regular time grid. Ac- 
cording to this approach, the time interval [to, ^m-i] is divided into a number 
— 1 3> M of subintervals (bins) centered at {ti}{^q^ time instants. A new 
time series Xtq, Xn, • • • , Xr^-i is obtained by assigning each tj to the nearest 
bin, i.e. by setting Xn. = Xt^ if n. is the time instant closest to tj, and zero 
otherwise. More specifically, if an array {Xn} of Ai zeros is created, index Ij 
is given by 



Ij = round 



(35) 



where round [t] is the operator that provides the integer closest to t. In this 
way a grid of Ml time instants, regularly spaced with a time step At = 
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— ^o)/(-^ ~ 1)) is obtained but in the resulting time series {Xn} some 
of the entries are equal to zero. The FFT algorithm can be directly applied to 
this time series and the LS periodogram computed through Eqs. ([8])- ([9]) and 
fl22|) - fl24|) . Intuitively, this approach may be expected to provide satisfactory 
results if the differences {6ri} = {tj — n} are reasonably small with respect 
to the frequencies of interest. 

To quantify this assertion, let us suppose, without loss of generality, that 
the signal under study is a sinusoid Xtj = sin {27iftj + (p) which is rebinned 
in such a way as to obtain a time series Xn = sin [27r/(rz + Sti) + 0]. Let sup- 
pose also that {Sn} are randomly distributed in the interval [— 0.5Ar, 0.5Ar] 
with E[Sti] = 0. From Eqs. ([T^-(ra 

M-l 

af=J2 sin [2vr/(rz + 6ri) + 0] cos (27r/rO; (36) 

1=0 
M-l 

bf=J2 sin [2vr/(rz + 6ti) + 0] sin (27r/rO. (37) 

1=0 

Now, if the terms sin [27r/(r; + 6ti) + 0] are expanded up to the linear term, 
one obtains 

M-l 

af=J2 [sin (2vr/ri + 0) + 27rf6Ti cos (27r/rz + 0)] cos (27r/rO; (38) 

1=0 
M-l 

bf=J2 [sin (27r/r; + 0) + 2TTf6Ti cos {27, fn + 0)] sin (27r/rO, (39) 

1=0 

or 

M-l 

«/ = «/+ XI 27r/(5ri cos {27rfTi + 0) cos (27r/ri); (40) 

1=0 

M-l 

h = bf+J2 ^^Z*^^' COS (27r/ri + 0) sin {271 fn). (41) 

If the time grid {ti} is fixed it results that E5T-[aj] = af and E5^[6j-] = bf. 
Moreover, if it is assumed that the quantities 6ti are distributed indepen- 
dently and identically from a uniform PDF as well as independent of {ti}, it 
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happens that 



M-l 

J2 cos2 {2nfn + 0) cos2 (27r/rO; (42) 

1=0 
M-l 

J2 cos2 (27r/r, + 0) sin^ {27rfn). (43) 

1=0 

As expected, from this result it is evident that the error introduced by the 
rebinning operation is proportional to the product of the frequency / and 
the sampling time step At . Hence, an accuracy to any desired precision can 
be obtained if At is chosen sufficiently small. In practical applications, such 
choice does not represent a critical step: once the largest frequency /max of 
interest (in units of 1/Ar) is set, it is sufficient that At ^ l//max- 

For illustrative purposes. Fig. H] shows the results of a numerical simula- 
tion where a sinusoid Xt^ = sin (27r/otj) is sampled on 100 times {tj}^^^ that 
are randomly and independently generated from a uniform distribution in 
the interval T = [0, 10000] (in free units). The time instants {tj} have been 
rebinned on a regular time grid [0, 10000] by setting ti = round[tj]. Proceed- 
ing in this way, the time instant r/ approximates the corresponding tj with a 
precision of four digits and At = 10'^. The frequency /o is considered in the 
interval [10~^,0.1] in units of (Ar)~^. From this figure it is evident that the 
linear approximation in Eqs. ( HU]1 -( H5]) holds up to frequencies of about 0.01. 
However, both the approximated coefficients {bf^} as well the approximated 
periodogram pf^ are within some percent with respect to the true value up to 
a frequency of 0.1 (for the coefficients {a/} similar results hold). It is worth 
stressing that /o = 0.1 is a rather high frequency with respect to a mean 
At ^ 100. 

4. Is the Lomb-Scargle periodogram really advantageous? 

In Sec. 13.31 it has been shown that the LS periodogram can be computed 
with accuracy to any desired precision without the necessity of dedicated 
algorithms/software. At this point, assuming that the error of the approxi- 
mation is negligible or even that an exact algorithm has been used, one can 
go one step further and wonder whether, to test the statistical significance of 
a peak, the decorrelation of the coefficients aj and bf, which is at the heart of 



_ 27r/Ar 



_ 27r/Ar 
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the LS periodogram, is really a necessary operation. Using arguments based 



on the spectral windows Wj = J2p=o^^^P (~27r/tj), IVio et al. I (j2010l ) have 



suggested that this is not the case since the correlation coefficient p between 
af and 6/ is typically close to zero. Here, to support this claim we follow a 
different approach. If xt^ = {nt^}, with {nt-} the realization of a discrete, 
zero-mean white-noise process with standard deviation cr„, then E„[a/] = 0, 
E„[6/] = and 

P= , ^"1°^^^' (44) 



where 



A/-1 



E^lafbf] = ctIJ2 cos (2^ A) sin (2vr/t,), (45) 

j=0 
2 M-1 

= ^^sin(4vr/t,). (46) 

j=0 

In the regular sampling case, it results that E„[a/6/] = and consequently 
p = 0. The same does not hold in the irregular sampling case. However, 
since sin(47r/tj) is an odd function, one may expect that E„[a/6/] ~ and 
hence p ~ if the angles {dj} = 47r/{tj} of a unit circle are uniformly 
and/or symmetrically distributed. In practical applications this condition is 
not infrequently met. For example, in the case of M independent sampling 
time instants randomly and uniformly distributed in the interval [0,T], one 
finds that the expected correlation coefficient Pt,afhj is given by 

Et{E„[a;6;]} , 

Pt,afbf = , [4:1) 

l-cos(4./T) ^^g^ 



v/(47r/T)2 - sin^ (47r/r) ' 

where at^aj and at^^ denote the standard deviations with respect to the time 
instants {tj}. From this equation it is clear that Pt,afbf goes rapidly to zero 
for increasing values of T. For fixed times, a formal proof is difficult since it 
is strictly dependent on the specific sampling pattern. However, it is improb- 
able that the combination of the frequencies / and the times {tj} makes the 
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distribution of the angles {<yj} strongly nonuniform and/or asymmetric. For 
example, high values of p can be obtained if all the angles aj are distributed 
in an interval [a* — evr, a* + en] C [0, n] with a* e [0, vr] a given angle and e 
a real number that takes its value in the interval min [a*/^, 1 — a*/7r]. The 
condition for this to happen is 

471 ftj = a* + [2k7T ± en], (49) 

with K an integer, or 

(50) 

From this equation it results that 1) the sampling pattern must be constituted 
by times distributed in equispaced time intervals with the same duration 
which is proportional to e; 2) such a sampling pattern is specific to each 
frequency /. This means that, even in the case where the sampling is such 
as to produce a high p for a given frequency, the same could not be true for 
other frequencies. This is not a rigorous demonstration of the fact that a 
nonuniform and irregular distribution of the angle aj is improbable. In fact, 
combinations of sampling times, lags and frequencies are possible that can 
do the job. However, the considerations above suggest that things have to 
conspire to produce remarkable effects. 

To support these conclusions, in Figs. IMT^ the results of a few numerical 
simulations are presented. In particular. Figs. |5]|6] show the histograms of 
the time instants corresponding to two sets of simulated sampling patterns 
ranging from regular to extremely irregular sampling. The reason for making 
such a choice is to verify that high values of p are not linked to the degree 
of irregularity of the sampling. For the first set, the time instants have 
been generated starting from a grid of time instants {tj} regularly spaced 
in the interval [—1,1] and then setting tj = {sign[tj](abs[tj])'^(M — 1) + 
l}/2. For the second set, the starting regular grid is tj G [0, —1] and tj = 
tj{M — 1). Here, sign[.] is the sign function, abs[.] indicates absolute value 
and 7 is a positive real number. In both cases 7 = 1 corresponds to an 
equispaced time grid. In the numerical experiment it has been assumed 
that cr„ = 1 and several values of 7 have been tested. Figs. [71 19] show the 
corresponding correlation coefficients. The values of M = 100 and M = 1000 
have been taken as cases of a small and of a larger data set, respectively. In 
both cases, the median time sampling step for the different values of 7 lies 
approximately within the interval (0.4,1.1). A set of frequencies has been 



a i 
4^ ^ 2f 



2. 
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examined in the range [0.1, 3.0]. From these figures it is clear that, in spite of 
the extremely irregular sampling under examination, significant correlation 
between a/ and bf happens only for the small data set. Even in this case, the 
correlation is weak (< 0.2). The fact that p depends on the distribution of 
the angle aj is supported by Figs. [H [10] where the distribution of the angles 
{aj} on the unit circle is shown for the case M = 100 and 7 = 2.5. It is 
evident that also with this limited number of data, the distribution of the 
aj is approximately uniform and symmetric. For the case M = 1000, the 
distribution (not shown here) is even more regular and indeed, as visible in 
Figs. [71 [9] the corresponding correlation coefficients are closer to zero. 

Figures [TT}[T2] show the results concerning a few sampling patterns of more 
astronomical interest. In particular, for each value of 77, chosen in the range 
[0.1,0.9], 500 sampling time instants tj have been generated according to 

tj = (j - 1) + (lOOri - 99)/99 x mod(j - 1, 100), j = 1, 2, . . . , 500. (51) 

In this way, five equispaced observing sessions of duration lOOrj are simulated 
each containing 100 equispaced data and covering a total fraction 77 of the 
interval [0, 500]. Adjacent sessions are separated by a gap of length 100(1 — 
rj). Fig. [11] shows the correlation coefficients p for a set of frequencies / 
corresponding to different values of rj. Again, most of them are small. Only 
for 7] = 0.1 (i.e. very large gaps) and / = 0.02, does p attain the value ~ 0.7. 
Fig. [T2] shows the distribution of the angles {aj} on the unit circle computed 
for 7] = 0.1. So significant a correlation is due to a sampling pattern of the 
type given by Eq. (!50|) . A significant correlation for a given frequency does 
not imply that the same holds for other frequencies. 

4.1. Analysis of an experimental time series 

For demonstration purposes, we check what happens in the case of an 
experimental time series with periodic gaps. As explained above, this is a sit- 
uation more favorable for a nonuniform distribution of the angles aj. In this 
regard, the LS periodogram and the version as given in Eq. f[T^ are compared 
in the case of the light curve of the low mass X-ray binary EXO 0748—67 6 



a source which shows an orbital period of 3.82 hr (iParmar et al. 1 ll986l ). 
This object shows 8.3 minute X-ray eclipses every orbital period, irregu- 
lar dipping activity (energy-dependent absorption) and type I X-ray bursts. 
EXO 0748— 676 has been extensively studied with the X-ray observatory 
XMM-Newton. In particular, it was observed on seven occasions during 
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September-November 2003 for a total exposure time of 570 ks. For each 
observation, data were acquired simultaneously with all of the on-board 
i nstruments. Here, we present the light curve of the optical/UV monitor 



(IMason et al. II2OOII ) during 12-13 November 2003. The data were originally 



taken with a sampling of 500 ms. These data are quite noisy and the light 
curve was rebinned to a sampling of 32 s to increase the signal-to-noise ratio 
per bin. The sampling of this signal, shown in the top panel of Fig. [T31 
is regular but some periodic gaps are present. Since these gaps are rather 
short, we have considered two other situations where larger periodic gaps are 
obtained by removing 20% and 70% of the data as shown in the central and 
bottom panels of Fig. [T31 respectively. In this way a time series with regular 
sampling and wider periodic gaps is obtained. In spite of the presence of 
large gaps, from Fig. [THit clearly appears that the periodogram f lT^ and the 
LS periodogram when computed for the mean-subtracted signal, are quite 
similar. As is visible in Fig. [151 the same is not true without subtraction of 
the sampling mean. This is not surprising since, as shown above, the mean 
value introduces a spurious correlation between the af and bj coefficients. 



5. Final remarks and conclusions 

In this paper we have addressed the problems related to the spectral anal- 
ysis of uneven time series. We have reexamined, with formalized arguments 
and some numerical experiments, the pros and cons of an even vs. an uneven 
sampling. 

1. A regular data sampling simplifies the analysis as well as the develop- 
ment of efficient algorithms. However, it permits one to retrieve only 
the frequencies characteristic of the signal that are smaller than the 
Nyquist frequency; 

2. An irregular sampling introduces some computational as well statistical 
problems but it permits one to retrieve information about frequencies 
even much greater than the Nyquist frequency; 

3. Although from the theoretical point of view techniques specific to the 
spectral analysis of uneven sampled signals such as the Lomb-Scargle 
periodogram could be of some interest, their effectiveness in practical 
astronomical applications is limited. Indeed, approximated but simpler 
techniques are able to provide similar results and are easier to use and 
to modify to deal with situations different from those under which the 
original LS periodogram has been developed. 
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Before concluding, it is necessary to stress that often in Astronomy the spec- 
tral analysis can be safely used only as a test to check whether a time series 
contains a signal of interest or is constituted only of noise. For example, for 
both the regular and the irregular sampling, the periodogram cannot provide 
a reliable statistical characterization of a red noise signal if the experimental 
time series spans an interval shorter than the time scale of the signal itself. 
Moreover, in the presence of an irregular sampling and independently of the 
technique used, the periodogram cannot be used to identify the frequencies 
of a periodic signal becaus e of the 'interf erence" between the true peaks 
and those due to sa mpling (IDeeming Ill975l). In this c a se, other techniques 



are n ecessary (e.g. [Roberts et al. 1119871 : Foster Ill995l : iBourguignon et al. 



20071). 



Some software code and data can be made available upon request. 
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Figure 1: Theoretical vs. estimated values of coefficients a/, 6/ (cf. Eqs. ([3T|) . ([32]) . ([33]), 
([M]) . and the corresponding standard deviations aa^, (Jb;), for a signal a;tj = sin (27r/oij), 
j = 0,1,..., 99, when the sampling time instants tj are uniformly and independently 
distributed in the interval [0,99]. Here, /o = 1 in units of the mean At (= 1). The 
estimated quantities are based on the mean of 500 different numerical experiments. 
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Figure 2: Theoretical vs. estimated values of coefficients a/, 6/ (cf. Eqs. ((3T|). ([32l) . (|33|) . 
([34l) . and the corresponding standard deviations o-q^, CT6^), for a signal xtj = sin (27r/otj), 
j = 0,1,..., 99, when the sampling time instants tj are uniformly and independently 
distributed in the interval [0,99]. Here, /o = 1 in units of the mean At (= 1) and the 
estimated quantities are based only on a single numerical simulation. 
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Figure 4: Numerical experiment to test the effects of rebinning of an irregular time series 
on a regular time grid. Here, Xt^ = sin (27r/otj), j = 0,1,..., 99, with the sampling 
time instants independently and uniformly distributed in the interval [0, 10000] and then 
rounded to the nearest integer. A number = 5 x 10^ of equispaced frequencies are 
considered in the set [1/A'', 2/7V, . . . 0.5]. Top-left panel: linearly approximated vs. true 
b fg . The first 10'^ frequencies are plotted in green; Top-right panel: corresponding absolute 
errors. The expected standard deviation interval derived from the linear approximation 
is in plotted in red; Bottom-left panel: corresponding relative errors; Bottom-right panel: 
relative errors of the corresponding periodogram. 
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Figure 5: Distribution of the first set of irregular sampling time instants used to test 
the effects of the rebinning operation on the accuracy of the computed Lomb-Scargle 
periodogram. When 7 = 1 the sampling is regular and becomes more and more irregular 
when 7 — > or 7 — >■ oo. Here the case with M — 1000 sampling time instants is shown. 
The distribution for the case M = 100 is similar. 



24 



y=0.25 



Y=0.3 



7= 0.4 



« 80 
c 

CO 

w 60 
I 40 
o 20 



.52 40 
c 

CO 

« 30 

I 20 

o 10 



80 



S 60 



ffl 40 



20 



500 1000 500 1000 500 1000 

Y=0.5 y=0.75 y=1 



500 



Y= 1.5 



1000 



500 



1000 




1000 



150 



100 



1000 




1000 



1000 



Figure 6: Distribution of the second set of irregular sampling time instants used to test 
the effects of the rebinning operation on the accuracy of the computed Lomb-Scargle 
periodogram. When 7 = 1 the sampling is regular and becomes more and more irregular 
when 7 — > or 7 — >■ oo. Here the case with M — 1000 sampling time instants is shown. 
The distribution for the case M = 100 is similar. 
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Figure 7: Correlation coefficients p, cf. Eq. (|44|) . of a/ with bf (cf. Eqs. (p6|) . (|37| ). against 
the 7 parameter for a set of different frequencies / and a number of sampHng time instants 
M = 100 (blue line) and M = 1000 (red line), distributed as shown in Fig. [5] In spite of 
the extremely irregular sampling, significant p occurs only for small data sets. But even 
in this case the correlation is weak. 
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Figure 8: Distribution of the angles aj on the unit sphere for the set of frequencies / as in 
Fig. [7] and a number of samphng time instants M = 100 distributed as in the bottom-right 
panel of Fig. [5] 
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Figure 9: Correlation coefficients p (cf. Eq. gl])), of a/ with 6/ (cf. Eqs.dSS]), (|37l)). 
against the 7 parameter for a set of different frequencies / and a number of sampling time 
instants M — 100 (blue line) and M — 1000 (red line), distributed as shown in Fig. [6] 
In spite of the extremely irregular sampling, significant p occurs only for small data sets. 
But even in this case the correlation is weak. 
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Figure 10: Distribution of the angles aj on the unit sphere for the set of frequencies / as in 
Fig.[9]and a number of samphng time instants M — 100 distributed as in the bottom-right 
panel of Fig. [6] 
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Figure 11: Correlation coefficients p, cf. Eq. dM]), of a/ with bf (cf. Eqs.(l36l), (l37l) ) 
against the rj parameter for a set of frequencies / and a number of samphng time instants 
M = 500 generated in such a way as to simulate five observing sessions of duration 100?7 
each containing 100 equispaced data and covering a total fraction of the interval [0, 500]. 
Adjacent sessions are separated by a gap of length 100(1 — rj). Correlations are quite small 
except for / — 0.02 when rj = 0.1. 



30 



f = 0.004 f = 0.02 f = 0.04 




Figure 12: Distribution of the angles Uj on the unit sphere for the set of frequencies / and 
the sampHng time instants as in Fig. [Tl]for the case r] — 0.1. Notice the distribution for 
/ = 0.02 that corresponds to the case when the correlation p is high. 
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Figure 13: Top panel: original optical light curve of the transient low mass X-ray binaries 
EXO 0748— 676The time is in units of At = 32s; Central and bottom panels: the same 
light curve with 20% and 70% of the data removed in such a way as to simulate 5 different 
observing sessions with the same duration and spaced with gaps again with the same 
duration. 
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Figure 14: Lomb-Scargle periodogram (LS) vs. the periodogram as given by Eq. ()14[) (here 
indicated as classic") corresponding to the mean-subtracted time series in Fig. 1131 The 
frequency is in units of 1/At with At the median samphng time step of the original hght 
curve from which the time series have been obtained. 
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Figure 15: Like in Fig.[T4]but without the subtraction of the mean from the signal. Notice 
that, unUke for Fig. [Ml here the Lomb-Scargle periodogram (LS) is different from the 
periodogram as given by Eq. ([T4| (here indicated as ''classic") . 
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